Software reliability model of open source software based on the decreasing trend of fault introduction

Open source software (OSS) has become one of the modern software development methods. OSS is mainly developed by developers, volunteers, and users all over the world, but its reliability has been widely questioned. When OSS faults are detected, volunteers or users send them to developers by email or network. After the developer confirms the fault, it will be randomly assigned to the debugger who may be a developer, a volunteer, or a user. These open source community contributors also have the phenomenon of learning when removing faults. When the detected faults are removed, the number of introduced faults decreases gradually. Therefore, this study proposes a software reliability model with the decreasing trend of fault introduction in the process of OSS development and testing. The validity of the proposed model and the accuracy of estimating residual faults are verified by experiments. The proposed model can be used to evaluate the reliability and predict the remaining faults in the actual OSS development and testing process.


I. Introduction
Recently, the rise of open source software (OSS) has become another way of development relative to closed source software. Since the development and testing process of OSS is mainly completed by developers, volunteers, and users all over the world, its reliability has been widely questioned. Although the current OSS adopts the strategy of "early release and frequent release" [1] to improve the reliability of software, the actual determination of OSS release is a difficult problem. Moreover, in this way, human and subjective factors are too strong, and this condition is not conducive to improving the reliability of the actual OSS.
Considering that volunteers' or users' interests change in the process of OSS development and testing, Li et al. [2] proposed an OSS reliability model with the fault detection rate increasing first and then decreasing. Wang and Mi [3] studied the trend of the decline in the fault detection rate during OSS development and testing, and they proposed a reliability model of OSS based on the decline in the fault detection rate. Wang [4] also proposed that fault introduction obeys a Pareto distribution and established the corresponding single release OSS reliability model. In addition to the abovementioned two single release OSS reliability models, Singh et al. [5] proposed a multi-release OSS reliability model by studying the entropy change of OSS source code. Tandon et al. [6] proposed the corresponding multi-release reliability models of OSS by studying the distribution of OSS in the process of fault detection. Notably, the difference between single release OSS reliability model and multi-release OSS reliability model is that they are the two kinds of modeling ideas. The former assumes that the faults in each version of OSS are independent of each other [7]. The latter assumes that faults in the current version are related to faults in previous versions, and faults in the current version also affect fault detection and removal in later versions.
Although some of the abovementioned OSS reliability models can effectively evaluate the reliability of OSS under certain conditions, they ignore another phenomenon in the process of OSS development and testing, that is, the gradual decrease in the number of introduced faults over time. The reason why the number of introduced faults is decreasing is closely related to the learning phenomenon [8] of OSS debuggers and their hobbies, interests, and professional skills. In view of this situation, we propose an OSS reliability model based on fault introduction decline variation in this study. The validity and accuracy of the proposed model are verified by relevant experiments.
The contributions of this work are described as follows: 1. It is first to propose that the number of introduced faults is gradually decreasing in the process of OSS development and testing. The number of faults introduced is related to the learning process, professional knowledge, interests, and hobbies of debuggers.
2. The decreasing changes of fault introduction are consistent with the actual development and testing process of OSS.
The rest of the paper is organized as follows: Section II explains the reason of the decreasing changes of fault introduction during OSS development and testing. Section III describes the developed process of the proposed model. Section IV deals with the model parameter estimation. Section V provides an experiment on the fitting and predictive performances of the proposed model compared with other models. Moreover, fault data sets of OSS are introduced, model comparison criteria are described, comparison models are provided, and the fitting and predictive performances of the proposed model are analyzed and discussed. Section VI gives the sensitivity analysis of model parameters. Threats to validity of the proposed model are presented in Section VII. Section VIII contains a review of the literature. Conclusions are drawn in Section IX.

II. Reasons for the gradual reduction in fault introduction during OSS development and testing
We consider that, in the process of OSS development and testing, the reasons for the decline in fault introduction come from three aspects: first, learning factors [8]; second, interests and hobbies of developers or volunteers; third, the professional skills of developers or volunteers.

Learning factors
After the OSS fault is sent to the developer through email or network, the developer first confirms whether it is a real fault, and then, it randomly assigns it to the corresponding debugger. When the debugger removes the fault, he also has a continuous rich experience and in-depth understanding of the corresponding OSS functions or products. With the current software development and testing, detected faults are becoming less, and the experience or knowledge of debuggers is also growing, that is, a learning process. Therefore, when detected faults are removed, the number of introduced faults will gradually become less. Fault introduction is decreasing with the development and testing of OSS.

Interests and hobbies of developers or volunteers
These debuggers are also fans of OSS; they have a strong interest in the current development of OSS projects. These factors drive them to devote themselves wholeheartedly to determine the cause of the fault, try their best to remove the fault, and attempt to reduce the introduction of new faults.
3. Professional skills of developers or volunteers Debuggers themselves are experts in the field in general. They have deep insight, rich experience, and knowledge on the development of the software. When they remove faults, the probability of introducing new faults is very small. With the development and testing of OSS, they have a deep understanding of the function and role of the software. When they remove faults, the number of introduced faults will gradually become less.

III. Reliability model of OSS based on gradual decrease in fault introduction
OSS fault detection usually follows nonhomogeneous Poisson process (NHPP). In other words, the fault detection process of OSS can be regarded as a counting process, which is represented by N(x). It is defined as follows: where Pr{} and ξ(x) represent a probability and mean value function, respectively. x denotes a random variable.
Assumptions of the proposed model are listed as follows: 1. Fault detection of OSS follows NHPP and the faults detected in OSS are removed immediately.
2. The number of faults detected in OSS is related to the number of remaining faults in the software.
3. New faults can be introduced when the faults in OSS are removed.
4. Debuggers have the phenomenon of learning in the process of fault removal in OSS.
5. The number of fault introduction is gradually decreasing in the process of OSS development and testing.
They can be obtained from assumptions 1 and 2, and Herein, ξ(t), ψ(t), and ϕ(t) represent the expected cumulative number of detected faults, the fault detection rate function, and the expected total number of initially detected and introduced faults by time t, respectively. η, F(t), and C denote the expected total number of initially introduced faults, the fault introduction rate function obeying a distribution, and the expected total number of initially detected faults, respectively. It can be derived from Assumption 4, where ψ and γ represent a fault detection rate and a fault inflection factor, respectively. According to Assumptions 3 and 5, we give a distribution function considering fault introduction with a decreasing trend over time.
where μ is a scale parameter. t is a time variable. Simultaneous solution of Eqs (2)-(5) is The proposed model can be expressed mathematically as follows: Detailed solution processes can be found in Appendix A.

IV. Model parameter estimation method
In this study, least square estimation (LSE) of nonlinear regression is used to estimate parameters' values of OSS reliability models. It is defined as follows: where O t i represents the number of observed faults by time t i . When taking partial differential on both sides of Eq (8), By solving the Eq (9) simultaneously, we can obtain the estimated values of parameters of the proposed model. Notably, we use LSE to estimate parameters' values of models instead of maximum likelihood estimation (MSE). Some cases may have no maximum likelihood estimation function value. Therefore, the parameters' values of the model cannot be estimated. It is not conducive to the performance comparison of the model.

A. OSS fault data sets
We use two fault data sets collected from Apache Storm and Apache Chemistry OpenCMIS projects of OSS products in issue tracking systems (https://issues.apache.org/) to verify the fitting and predictive performances of the proposed model. Apache Storm is a distribution system of real-time computation with free and open source nature. It can deal with unbounded data streams and batch processing like Hadoop, but it is much easier to use with any programming language. Apache Chemistry OpenCMIS simplifies the client and server development of content management interoperability services, and it provides API, SPI, and testing tools and allows content server and application developers to focus on ECM domain model without relating to the underlying communication rules.
The first fault data sets (DS1) from Apache Storm project include three releases, namely, STORM 1.0.1 (Release-1), STORM 1.0.2 (Release-2), and STORM 1.0.3 (Release-3), which are denoted as DS1-1, DS1-2, and DS1-3. The second fault data sets (DS2) from Apache Chemistry OpenCMIS project include three releases, namely, OpenCMIS 0.4.0 (Release-1), OpenCMIS 0.5.0 (Release-2), and OpenCMIS 0.6.0 (Release-3), which are denoted as DS2-1, DS2-2, and DS2-3. We use the first fault data sets (DS1) to compare the goodness-of-fit of the proposed model with those of other models. In addition, we use the second fault data sets (DS2) to validate the prediction performance of the proposed model. In issue tracking systems, each issue (fault) has multiple attributes, and each attribute has multiple sub-attributes. For example, the attribute of the issue (fault) mainly includes Type, Key, Summary, Assignee, Reporter, Priority, Status, Resolution, Created and Updated. Moreover, the sub-attribute of Status mainly includes OPEN, REOPENED, RESOLVED, and CLOSED. The sub-attribute of Resolution has Unresolved, Fixed, Duplicate, Not A Problem, Cannot Reproduce, and Would Not Be Fixed. Notably, fault data sets in this paper are collected from fixed version excluding the sub-attributes of faults (issues) with Unresolved, Duplicate, Not A Problem, Cannot Reproduce, and Would Not Be Fixed. We select fault data sets of OSS randomly to test the adaptability, stability, and robustness of the proposed model.

B. OSS comparison models and criteria
In this study, we use five OSS reliability models to compare with the proposed model. The five OSS reliability models include two types: one is a single release OSS reliability model, and the other is a multi-release OSS reliability model. For example, the Li model [2] and the Wang model [3] are the former, and the Yang multi-release model [9], the Singh multi-release model [5], and the Tandon multi-release model [6] are the latter. For comparison, we use Case 5 in the literature [5] as the Singh multi-release model, and Model-1 in the literature [6] as the Tandon multi-release model. We use four comparison criteria for the goodness of fit, namely, MSE [10], R 2 [10], TS [10], and Bias [10], for the fitting performance comparison. We use four comparison criteria for prediction, namely, PSSE [11], TS [10], Variance [10], and Bias [10], for the predictive performance comparison. Table 1 lists model comparison criteria in detail.

C. Comparative analysis and discussion on the fitting performance of OSS
We use 100% of fault data sets (DS1) to compare the fitting performance of models. Tables 2-4 show that the proposed model has the best fitting performance among all models. The fitting Table 1. Model comparison criteria for OSS.

MSE
Goodness-of-fit: where X t i represents the actually observed failure data by time t i . Λ(t i ) denotes the expected cumulative number of faults detected by time t i . k 1 is the fitting sample size of fault dataset.

R 2
Goodness-of-fit: where X t i represents the actually observed failure data by time t i . Λ(t i ) denotes the expected cumulative number of faults detected by time t i . k 1 is the fitting sample size of fault dataset.

TS
Goodness-of-fit: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Prediction: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X k where X t i represents the actually observed failure data by time t i . Λ(t i ) denotes the expected cumulative number of faults detected by time t i . k 1 is the fitting sample size of fault dataset. k is the total sample size of fault dataset. i = 1,2,3,. . .,k 1 ,. . .,k.
(Continued ) performance of the proposed model is stable, but other models are unstable. For example, the second is the Singh multi-release model-1 in Table 2, but the second is the Tandon multirelease model-3 in Table 4. Table 2 shows that the last is the Wang model, and MSE of the Wang model is nearly five times as large as that of the proposed model. In Table 3, the last is also the Wang model-2, and its MSE is nearly eleven times as large as that of the proposed model. Table 4 shows that the last is the Yang multi-release model-3, and its MSE is approximately one hundred and fiftytwo times larger than that of the proposed model.
Overall, the fitting performance of the proposed model is the best among all models. The fitting performance of the Singh multi-release model is equal to that of the Tandon multirelease model, and they are better than other models except the proposed model. In terms of the goodness of fit, the Wang model is equal to the Yang multi-release model, and they are worse than other models. Fig 1(A)-1(C) show the comparison results on the fitting performance of the selected models in this study.

Formulation Description
Bias Goodness-of-fit: Prediction: Herein, PSSE denotes that the average value of the deviation between the actually observed data and the predictive data. https://doi.org/10.1371/journal.pone.0267171.t001

D. Comparative analysis and discussion on the predictive performance of OSS
We use 90% of fault data sets (DS2-1), 90% of fault data sets (DS2-2), and 80% of fault data sets (DS2-3) to randomly select samples for comparing the predictive performance of models. In Table 5, the first is the proposed model, the second is the Singh multi-release model-1, and the last is the Li model based on PSSE, TS, variance, and bias comparisons. PSSE of the proposed model is nearly twelve times as large as that of the proposed model. Table 6 shows that the first is the proposed model, the second is the Li model, and the last is the Tandon multi-release model-2. PSSE of the proposed model is nearly three times less than that of the Tandon multirelease model-2. Table 7 shows that the proposed model has the best predictive performance among all models. The second is the Li model, and the last is the Singh multi-release model-3. PSSE of the proposed model is nearly fourteen times as low as that of the Li model. As shown in Fig 2, the predictive ability the proposed model is better than that of other models. In summary, the predictive performance of the proposed model is best among all models. From Tables 5-7, we can see that the proposed model has the best stability among all models. However, the predictive performance of other models is not stable. We can conclude from Tables 2-7 that the proposed model has the best fitting and predictive performances. With the change in OSS development and testing environment, the fitting and predictive performances of other models are uncertain and unstable.   shows that most fault data points fall within 95% confidence intervals. These findings indicate that the proposed model has good stability in parameter estimations.

VI. Sensitivity analysis
We conduct a corresponding sensitivity analysis of model parameters to investigate their important influence in the proposed model. The method of parameter sensitivity analysis facilitates the variation in one parameter and sets other parameter values fixed while studying the parameter change in the model. Fig 4(A), 4(B), 4(D) and 4(E) show that parameters η, ψ, γ, and C of the proposed model are important parameters. Thus, the following conclusions can be drawn.
1. In the process of reliability modeling of OSS, fault detection is the key research object. Notably, the change in the fault detection rate (ψ) has an important influence on the proposed model.
2. Introduced fault total number (η) is also the key research object in establishing a high-quality reliability model of OSS. The change in fault introduction has an important influence on our proposed model.
3. The expected total number of detected faults is also crucial in evaluating the reliability accuracy of OSS. Furthermore, the expected total number of detected faults (C) has an important influence on our proposed model. 4. The change in the shape parameter (γ) in the proposed model has an important influence on the shape change of the proposed model.  As shown in Fig 4(C), the parameter μ of the proposed model is an inactive parameter, that is, a parameter without a significant effect. The reason is that fault introduction is a small probability event compared with fault detection. The shape parameter of the fault detection rate will change because the fault detection rate can vary in many ways. However, the fault introduction rate will be relatively stable and will insignificantly change.
Overall, the establishment of a high-quality reliability model of OSS must consider the changes in fault detection and introduction. Moreover, when the proposed model is used to evaluate the reliability of OSS, we must focus on the changes in the parameters (η, ψ, γ, and C) of the proposed model.

VII. Threats to validity
The threats to validity of the proposed model mainly come from three aspects: threats to internal validity, external validity and construct validity. Three aspects for threats to validity are discussed as follows, Internal validity: In our study, the threats to internal validity have two influence factors. First, we give an approximate analytical solution to facilitate the use of the proposed model to evaluate software reliability in the actual OSS development and testing process considering the complexity of OSS reliability modeling. Therefore, the performance of the proposed model may be affected. Second, the least square method is used to estimate the parameters' values of the model, and the parameter estimation is also an approximate value. Considering that the reliability evaluation of OSS is also an approximate probability estimation, the accuracy of our proposed model can meet the requirements.
External validity: In this paper, there are two factors for the threats to external validity. First, few OSS fault data sets and types may be used for experiments, and they may come from the same issue tracking system. However, the fault data sets used in our experiment come from different OSS projects, which fully meets the minimum requirements for model performance comparison. Although the fault comes from the same organization, the fault in the issue tracking system has many attributes and sub-attributes. Therefore, it can meet the basic requirements of software fault types. Second, there may be a few OSS software reliability models we choose to compare, but the selected OSS software reliability models for comparison are the representative. Construct validity: In order to effectively evaluate and compare the performance of the model, we use six comparison criteria. These comparison criteria can effectively calculate the performance of the model quantitively, and greatly reduce the threats to construct validity.  Although we use a few model comparison criteria, they are widely used in software reliability model comparison and evaluation.

VIII. Related work
In recent years, OSS has been widely used, and its reliability has been the focus of research. The established reliability model of OSS can be divided into the following two categories: perfect debugging (PD) and imperfect debugging (ID).
The PD reliability model of OSS is used when detected faults are removed in the process of OSS testing and no new faults are introduced. For example, Aggarwal et al. [12] proposed an OSS reliability model considering the changing point and fault reduction factor with the Weibull distribution. Kuo et al. [13] proposed an OSS reliability model considering two-parameter generalized Pareto distribution. Huang et al. [14] proposed the new OSS reliability model based on the bounded generalized Pareto distribution through improving the existing OSS reliability models with the generalized Pareto distribution. Tamura and Yamada [15][16][17] used stochastic differential equation to build the OSS reliability models. Tamura and Yamada [18,19] used deterministic chaos theory to establish the OSS reliability models. Yamada and Tamura [20] proposed a few OSS reliability models considering the influence in OSS modeling. Lee et al. [21] established a prediction model, which is a multivariate linear regression analysis model, by using the OSS life cycle measurement method. Lin and Li [22] proposed a modified queuing theory to establish the corresponding OSS reliability model, which is called rate-based queuing simulation model. The ID reliability model of OSS is used when the detected faults are removed and new faults may be introduced. For example, Saraf and Iqbal [23] proposed a multi-release ID software reliability model with changing-point. Considering ID and changing-point, Khurshid et al. https://doi.org/10.1371/journal.pone.0267171.g004 [24] proposed a multi-release framework of OSS reliability modeling. In addition, Saraf et al. [25] proposed a multi-release software reliability framework not only incorporating ID and change points but also considering fault detection and correction. Considering various levels of faults, Sun and Li [26] proposed the corresponding software reliability model of ID. Considering fault dependency and changing-point, Chatterjee et al. [27] proposed an ID reliability model for multi-upgradation software. In this study, we propose an ID reliability model of OSS considering that fault introduction decreases over time. In the testing process of OSS, fault introduction and detection are complex and random processes. Moreover, fault introduction gradually decreases during OSS testing. Pradhan et al. [28] proposed two ID software reliability models considering the delay between fault detection and introduction and fault decreasing factors with the general inflection S shape. By studying the change-point and ID phenomena during the OSS test, Khurshid et al. [29] put forward a corresponding framework. Considering the different fault severity levels in bug tracking systems, Yanagisawa et al. [30] proposed two kinds of OSS reliability growth models based on a hazard rate. One is a PD software reliability model, and the other is an ID software reliability model.
From the multi-release perspective of OSS, many researchers have established the corresponding OSS reliability models [31,32]. For example, Singh et al. [5] studied the changes of codes and files for OSS during the development and test. They established the corresponding multi-release reliability models of OSS considering using a Cobb-Douglas function to integrate the code changing entropy and testing time. Yang et al. [9] proposed a multi-release reliability framework of OSS considering the delay between fault detection and correction. Garmabaki et al. [33] divided the software testing process into internal and parallel testing. They assumed that faults in current version consist of the remaining faults in the previous version and generated faults due to increasing and modifying components and functions in the current version. Thus, they proposed a multi-release reliability model for OSS. Nijhawan et al. [34] assumed that some errors are added into codes of OSS during up-gradation, and some faults are introduced in the current version besides remnant faults in the previous version. They proposed an ID multi-release reliability model for OSS. Pradhan et al. [35] assumed that fault detection of OSS followed a generalized distribution considering a modified Weibull process, and proposed a multi-release reliability growth model for OSS. Aggarwal et al.
[36] considered that the differences in expertise, skills, testing resources, and learning ability of volunteers and users result in a sudden change in the number of detected faults over time. This phenomenon can be called as a changing point on the fault detection rate changes. Therefore, they proposed a multi-release OSS reliability model based on the changing point. Saraf et al. [37] proposed several OSS reliability models considering changing points and ID. Diwakar and Aggarwa [38] considered that faults in the current version comprise two parts. One comprises a few remaining faults in the previous version, and the other is composed of some newly generated faults given that codes, functions, and components are modified in the current version. They proposed a related multi-release reliability model for OSS.

IX. Conclusions
In this study, we propose an OSS reliability model in which the fault introduction rate decreases over time. Experimental results indicate that the proposed model has better fitting and predictive performances than other models. The parameter sensitivity analysis of the proposed model shows that, except parameter μ, other parameters of the proposed model are important parameters. The proposed model can be used for actual OSS reliability evaluation and residual fault prediction. Considering the complexity and uncertainty of OSS development and testing environment, we will deeply study OSS reliability modeling under those environments in the future.